Multiple regression is formally known as the ordinary multiple linear regression model.
What a mouthful! Here’s what the terms mean:
Ordinary: The outcome variable is a continuous numerical variable whose random fluctuations
are normally distributed (see Chapter 24 for more about normal distributions).
Multiple: The model has more than two predictor variables.
Linear: Each predictor variable is multiplied by a parameter, and these products are added
together to estimate the predicted value of the outcome variable. You can also have one more
parameter thrown in that isn’t multiplied by anything — it’s called the constant term or the
Intercept. The following are examples of linear functions used in regression:
(This is the straight-line model from Chapter 16, where X is the predictor
variable, Y is the outcome, and a and b are parameters.)
(In this multiple regression model, variables can be squared or
cubed. But as long as they’re multiplied by a coefficient — which is a slope from the model
— and the products are added together, the function is still considered linear in the
parameters.)
(This multiple regression model is special because of the XZ term,
which can be written as
, and is called an interaction. It is where you multiple two
predictors together to create a new interaction term in the model.)
In textbooks and published articles, you may see regression models written in various ways:
A collection of predictor variables may be designated by a subscripted variable and the
corresponding coefficients by another subscripted variable, like this:
.
In practical research work, the variables are often given meaningful names, like Age, Gender,
Height, Weight, Glucose, and so on.
Linear models may be represented in a shorthand notation that shows only the variables, and not
the parameters, like this: Y = X + Z + X * Z instead of Y = a + bX + cZ + dX * Z or Y = 0 + X +
Z + X * Z to specify that the model has no intercept. And sometimes you’ll see a “~” instead of the
“=”. If you do, read the “~” as “is a function of,” or “is predicted by.”
Being aware of how the calculations work
Fitting a linear multiple regression model essentially involves creating a set of simultaneous equations,
one for each parameter in the model. The equations involve the parameters from the model and the
sums of various products of the dependent and independent variables. This is also true of the
simultaneous equations for the straight-line regression in Chapter 16, which involve estimating the
slope and intercept of the straight line and the sums of
, and XY. Your statistical software
solves these simultaneous equations to obtain the parameter values, just as is done in straight-line